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The recently developed Dynamic Vision Sensors (DVS) sense visual information 
asynchronously and code it into trains of events with sub-micro second temporal 
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suited for dynamic 3D visual reconstruction, by matching corresponding events generated 
by two different sensors in a stereo setup. This paper explores the use of Gabor 
filters to extract information about the orientation of the object edges that produce the 
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INTRODUCTION 

Biological vision systems are known to outperform any mod- 
ern artificial vision technology. Traditional fi-ame-based systems 
are based on capturing and processing sequences of still frames. 
This yields a very high redundant data throughput, imposing 
high computational demands. This limitation is overcome in bio- 
inspired event-based vision systems, where visual information is 
coded and transmitted as events (spikes). This way, much less 
redundant information is generated and processed, allowing for 
faster and more energy efficient systems. 

Address Event Representation (AER) is a widely used bio- 
inspired event- driven technology for coding and transmitting 
(sensory) information (Sivilotti, 1991; Mahowald, 1992; Lazzaro 
et al, 1993). In AER sensors, each time a pixel senses relevant 
information (like a change in the relative light) it asynchronously 
sends an event out, which can be processed by event-based pro- 
cessors (Venier et al, 1997; Choi et al, 2005; Silver et al, 2007; 
Khan et al, 2008; Camunas-Mesa et al., 2011, 2012; Zamarreno- 
Ramos et al., 2013). This way, the most important features pass 
through all the processing levels very fast, as the only delay is 
caused by the propagation and computation of events along the 
processing network. Also, only pixels with relevant information 
send out events, reducing power and bandwidth consumption. 
These properties (high speed and low energy) are making AER 
sensors very popular, and different sensing chips have been 
reported for vision (Lichtsteiner et al., 2008; Lenero-Bardallo 
et al., 2010, 2011; Posch et al., 2011; Serrano-Gotarredona and 
Linares-Barranco, 2013) or auditory systems (Lazzaro et al., 1993; 
Cauwenberghs et al, 1998; Chan et al, 2007). 

The development of Dynamic Vision Sensors (DVS) was very 
important for high speed applications. These devices can track 
extremely fast objects with standard lighting conditions, providing 



an equivalent sampling rate higher than 100 KFrames/s. Exploiting 
this fine time resolution provides a new mean for achieving stereo 
vision with fast and efficient algorithms (Rogister et al, 2012). 

Stereovision processing is a very complex problem for conven- 
tional frame-based strategies, due to the lack of precise timing 
information as used by the brain to solve such tasks (Meister and 
Berry II, 1999). Frame-based methods usually process sequen- 
tially sets of images independently, searching for several features 
like orientation (Granlund and Knutsson, 1995), optical flow 
(Gong, 2006) or descriptors of local luminance (Lowe, 2004). 
However, event-based systems can compute stereo information 
much faster using the precise timing information to match pixels 
between different sensors. Several studies have applied events tim- 
ing together with additional constraints to compute depth from 
stereo visual information (Marr and Poggio, 1976; Mahowald 
and Delbruck, 1989; Tsang and Shi, 2004; Kogler et al, 2009; 
Dominguez-Morales et al., 2012; Carneiro et al., 2013; Serrano- 
Gotarredona et al, 2013). 

In this paper, we explore different ways to improve 3D object 
reconstruction using Gabor filters to extract orientation informa- 
tion from the retinas events. For that, we use two DVS sensors 
with high contrast sensitivity (Serrano-Gotarredona and Linares- 
Barranco, 2013), whose output is connected to a convolutional 
network hardware (Zamarreno-Ramos et al., 2013). Different 
Gabor filter architectures are implemented to reconstruct the 3D 
shape of objects. In section Neuromorphic Silicon Retina, we 
describe briefly the DVS sensor used. Section Stereo Calibration 
describes the calibration method used in this work. In section 
Event Matching, we detail the matching algorithm applied, while 
section 3D Reconstruction shows the method for reconstructing 
the 3D coordinates. Finally, section Results provides experimental 
results. 
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FIGURE 1 I Data driven asynchronous event generation for two 
equivalent pixels in Retina 1 and Retina 2. Because of intra-die pixel 
nnismatch and inter-die sensor nnismatch, both response curves differ. 



NEUROMORPHIC SILICON RETINA 

The DVS used in this work is an AER siHcon retina with 128 x 
128 pixels and increased contrast sensitivity, allowing the retina 
to detect contrast as low as 1.5% (Serrano-Gotarredona and 
Linares-Barranco, 2013). The output of the retina consists of 
asynchronous AER events that represent a change in the sensed 
relative Hght. Each pixel independently detects changes in log 
intensity larger than a threshold since the last emitted event 9gy — 

—spike )\/m. 

The most important property of these sensors is that pixel 
information is obtained not synchronously at fixed frame rate 
5t, but asynchronously driven by data at fixed relative light 
increments G^y, as shown in Figure 1. This figure represents the 
photocurrent transduced by two pixels in two different retinas in 
a stereo setup, configured so that both pixels are sensing an equiv- 
alent activity. Even though if both are sensing exactly the same 
light, the transduced currents are different, given the change in 
initial conditions (/q and Iq) and mismatch between retina pixels 
that produce a different response to the same stimulus. As a con- 
sequence, the trains of events generated by these two pixels are 
not identical, as represented in Figure 1. 

The events generated by the pixels can have either positive 
or negative polarity, depending on whether the Hght intensity 
increased or decreased. These events are transmitted off- chip, 
timestamped and sent to a computer using a standard USB 
connection. 

STEREO CALIBRATION 

Before using a pair of retinas for sensing and matching pairs 
of corresponding events and reconstruct each event in 3D, both 
retinas relative positions and orientations need to be calibrated. 



Let us use lower case to denote a 2D point in the retina 
sensing plane as m = [xy]^, and capital letter to denote the cor- 
responding 3D point in real space as M = [X 7 Z]^ . Augmented 
vectors are built by adding 1 as the last element: m = [xy and 
M = [XY Z l]^ . Under the assumptions of the pinhole camera 
model, the relationship between m and M is given by Hartley and 
Zisserman (2003): 

m = Pi-M (1) 

where Pi is the projection matrix for camera i. In order to obtain 
the projection matrices of a system, many different techniques 
have been proposed, and they can be classified into the following 
two categories (Zhang, 2000): 

• Photogrammetric calibration: using a calibration object with 
known geometry in 3D space. This calibration object usu- 
ally consists of two or three planes orthogonal to each other 
(Faugeras, 1993). 

• Self-calibration: the calibration is implemented by moving the 
cameras in a static scene obtaining several views, without using 
any calibration object (Maybank and Faugeras, 1992). 

In this work, we have implemented a calibration technique based 
on a known 3D object, consisting of 36 points distributed in two 
orthogonal planes. Using this fixed pattern, we calibrate two DVS. 
A blinking LED was placed in each one of these 36 points. LEDs 
blinked sequentially one at a time, producing trains of spikes 
in several pixels at both sensors. From these trains of spikes, 
we needed to extract the 2D calibration coordinates m|-, where 
i = I, 2 represents each silicon retina and j = 1, ... 36 repre- 
sents the calibration points (see Figure 2). There are two different 
approaches to obtain these coordinates: with pixel or sub-pixel 
resolution. In the first one, we decided that the corresponding 2D 
coordinate for a single LED was represented by the pixel which 
responded with a higher firing rate. In the second one, we selected 
a small cluster of pixels which responded to that LED with a fir- 
ing rate above a certain threshold, and we calculated the average 
coordinate, obtaining sub-pixel accuracy. 

After calculating /t/^ and 0 — 1, ... 36) and knowing M^, 
we can apply any algorithm that was developed for traditional 
frame-based computer vision (Longuet-Higgins, 1981) to extract 
Pi and P2 (Hartley and Zisserman, 2003). More details can be 
found in Calculation of Projection Matrix P in Supplementary 
Material. 

The fundamental matrix F relates the corresponding points 
obtained from two cameras, and is defined by the equation: 

m\Fm2 = 0 (2) 

where fhi and m2 are a pair of correspondent 2D points in both 
cameras (Luong, 1992). This system can be solved using the 36 
pairs of points mentioned before (Benosman et al, 201 1). 

EVENT MATCHING 

In stereo vision systems, a 3D point in space M is projected onto 
the focal planes of both cameras in pixels mi and m2, therefore 
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FIGURE 2 I Photograph of the calibration structure, with 36 LEDs 
distributed in two orthogonal planes. The size of the object is shown in 
the figure. 



generating events e(m\ , t) and e(m2, t). Reconstructing the orig- 
inal 3D point requires matching each pair of events produced by 
point M at time t (Carneiro et al, 2013). For that, we imple- 
mented two different matching algorithms (A and B) based on 
a list of restrictions applied to each event in order to find its 
matching pair. These algorithms are described in the following 
subsections. 

RETINAS EVENTS MATCHING ALGORITHM (A) 

This first algorithm (Carneiro et al., 2013) consists of applying the 
following restrictions (1-4) to the events generated by the silicon 
retinas. Therefore, for each event generated by retina 1 we have 
to find out how many events from retina 2 satisfy the 4 restric- 
tions. If the answer is only one single event, it can be considered 
its matching pair. Otherwise, it is not possible to determine the 
corresponding event, and it will be discarded. 

Restriction 1: temporal match 

One of the most useful advantages of event- driven DVS based 
vision sensing and processing is the high temporal resolution 
down to fractions of micro seconds (Lichtsteiner et al., 2008; 
Posch et al., 2011; Serrano-Gotarredona and Linares-Barranco, 
2013). Thus, in theory, two identical DVS cameras observing the 
same scene should produce corresponding events simultaneously 
(Rogister et al., 2012). However, in practice, there are many non- 
ideal effects that end up introducing appreciable time differences 
(up to many milli seconds) between corresponding events: 

(a) inter-pixel and inter-sensor variability in the light-dependent 
latency since a luminance change is sensed by the photodiode 
until it is amplified, processed and communicated out of the 
chip; 

(b) presence of noise at various stages of the circuitry; 

(c) variability in inter-pixel and inter-sensor contrast sensitivity; 
and 

(d) randomness of pixel initial conditions when a change of light 
happens. 




► t 



FIGURE 3 I Temporal match. Two events can be considered as candidates 
to nnatch if they are generated within a certain tinne interval hf. 



Nonetheless, corresponding events occur within a milli second 
range time window, depending on ambient light (the lower 
light, the wider the time window). As a consequence, this first 
restriction implies that for an event e(m\, ti), only those events 
e(m2, t2) with \ti — t2\ < 5^/2 can be candidates to match, as 
shown in Figure 3. In our experimental setup we used a value 
of 5f = 4 ms, which gave the best possible result under standard 
interior lighting conditions. 

Restriction 2: epipolar restriction 

As is described in detail in (Hartley and Zisserman, 2003), when a 
3D point in space M is projected onto pixel mi in retina 1, the cor- 
responding pixel m2 lies on an epipolar line in retina 2 (Carneiro 
et al, 2013). Using this property, a second restriction is added to 
the matching algorithm using the fundamental matrix F to cal- 
culate the epipolar line Ep2 in retina 2 corresponding to event 
mi in retina 1 (£p2 (^i) — F^^i)- Therefore, only those events 
e(m2, t2) whose distance to Ep2 is less than a given limit S^p. can 
be candidates to match. In our experiments we used a value of 
^^p, = I pixel. 

Restriction 3: ordering constraint 

For a practical stereo configuration of retinas where the angle 
between their orientations is small enough, a certain geometrical 
constraint can be applied to each pair of corresponding events. 
In general, the horizontal coordinate of the events generated by 
a retina is always larger than the horizontal coordinate of the 
corresponding events generated by the other retina. 

Restriction 4: polarity 

The silicon retinas used in our experimental setup generate out- 
put events when they detect a change in luminance in a pixel, 
indicating in the polarity of the event if that change means 
increasing or decreasing luminance (Lichtsteiner et al, 2008; 
Posch et al, 2011; Serrano-Gotarredona and Linares-Barranco, 
2013). Using the polarity of events, we can impose the condition 
that two corresponding events in both retinas must have the same 
polarity. 
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Retinas Gabor filters 

FIGURE 4 I Illustration of the use of 3 Gabor filters with different 
orientations to the output of both retinas. The events generated by the 
filters carry additional infornnation, as they represent the orientation of the 
edges. 



GABOR FILTER EVENTS MATCHING ALGORITHM (B) 

We propose a new algorithm where we use the orientation of the 
object edges to improve the matching, increasing the number of 
correctly matched events. 

If the focal planes of two retinas in a stereo vision system are 
roughly vertically aligned and have a small horizontal vergence, 
the orientation of observed edges will be approximately equal 
provided that the object is not too close to the retinas. A static 
DVS produces events when observing moving objects, or more 
precisely, when observing the edges of moving objects. Therefore, 
correspondent events in the two retinas are produced by the same 
moving edges, and consequently the observed orientation of the 
edge should be similar in both retinas. An edge would appear 
with a different angle in both retinas only when it is relatively 
close to them, and in practice this does not happen because of 
two reasons^ : 

( 1 ) Since both cameras have small horizontal vergence, the object 
would be out of the overlapping field of view of the 2 retinas 
far before being so close. In that case, we do not have stereo 
vision anymore. 

(2) The minimal focusing distance of the cameras' lenses limits 
the maximal vergence. 

Considering that, we can assume that the orientation of an edge 
will be approximately the same in both retinas under our working 
conditions. Under different conditions, an epipolar rectification 
should be applied to the stereo system to ensure the orientations 
of the edges to be identical in the two cameras. This operation 
consists in estimating the homographies mapping and scaling 
the events of each retina into two focal planes parallel to the 
stereo baseline (Loop and Zhang, 1999). Lines in the rectified 
focal planes are precisely the epipolar lines of the stereo system. 
This rectification should be carried out at the same time than the 
retinas calibration. 

The application of banks of Gabor filters to the events gener- 
ated by both retinas provides information about the orientation 
of the object edges that produce the events as shown in Figure 4. 
This way, by using Gabor filters with different angles we can apply 
the previously described matching algorithm to pairs of Gabor 
filters with the same orientation. Thus, the new matching algo- 
rithm is as follows. The events coming out of retinas Ri and R2 
are processed by Gabor filters Gix and G2xy respectively (with 
X = 1, 2, . . . AT, being N the number of orientation filters for 
each retina). Then, for each pair of Gabor filters Gix and G2xy 
conditions 1-4 are applied to obtain matched events for each 
orientation. Therefore, the final list of matched events will be 
obtained as the union of all the lists of matched events obtained 
for each orientation. 

3D RECONSTRUCTION 

The result provided by the previously described matching algo- 
rithm is a train of pairs of corresponding events. Each pair 



^ There is, however, a "pathological" exception: a very thin and long object, 
perfectly centred between the two retinas, having its long dimension perpen- 
dicular to the retina planes, may produce different angles at both retinas. 



consists of two events with coordinates mi = (xi,yi) and m2 = 
fe, 72)^- The relationship between m and M for both retinas is 
given by: 

ihi X PiM = 0 (3) 
ni2 X P2M = 0 

where Pi and P2 represent the projection matrices calculated dur- 
ing calibration, and M is the augmented vector corresponding to 
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the 3D coordinate that must be obtained. These equations can 
be solved as a Hnear least squares minimization problem (Hartley 
and Zisserman, 2003), giving the final 3D coordinates M = 
[X 7Z]^ as a solution. More details can be found in Calculation 
of Reconstructed 3D Coordinates in Supplementary Material. 

RESULTS 

In this Section, we describe briefly the hardware setup used for the 
experiments, then we show a comparison between the different 
calibration methods, after that we characterize the 3D reconstruc- 
tion method, and finally we present results on the reconstruction 
of 3D objects. 

HARDWARE SETUP 

The event-based stereo vision processing has been tested using 
two DVS sensor chips (Serrano-Gotarredona and Linares- 
Barranco, 2013) whose outputs are connected to a merger board 
(Serrano-Gotarredona et al., 2009) which sends the events to a 
2D grid array of event-based convolution modules implemented 
within a Spartan6 FPGA. This scheme has been adapted from a 
previous one that used a Virtex6 (Zamarreno-Ramos et al., 2013). 
The Spartan6 was programmed to perform real-time edge extrac- 
tion on the visual flow from the retinas. Finally, a USBAERmini2 
board (Serrano-Gotarredona et al, 2009) was used to timestamp 
all the events coming out of the Spartan6 board and send them to 
a computer through a high-speed USB2.0 port (see Figure 5). 

The implementation of each convolution module in the FPGA 
is represented in Figure 6. It consists of two memory blocks (one 
to store the pixel values, and the other to store the kernel), a con- 
trol block that performs the operations, a configuration block 
that receives all the programmable parameters, and an output 
block that sends out the events. When an input event arrives, it is 
received by the control block, which implements the handshaking 
and calculates which memory positions must be affected by the 
operation. In particular, it must add the kernel values to the pixels 
belonging to the appropriate neighborhood around the address 
of the input event, as done in previous event- driven convolution 
processors (Serrano-Gotarredona et al, 1999, 2006, 2008, 2009; 
Camunas-Mesa et al, 2011, 2012). At the same time, it checks 




if any of the updated pixels has reached its positive or negative 
threshold, in that case resetting the pixel and sending a signed 
event to the output block. A programmable forgetting process 
decreases linearly the value of all the pixels periodically, making 
the pixels behave like leaky integrate-and-fire neurons. 

Several convolutional modules can be arranged in a 2D mesh, 
each one communicating bidirectionally with all four neighbors, 
as illustrated in Figure? (Zamarreno-Ramos et al., 2013). Each 
module is characterized by its module coordinate within the 
array. Address events are augmented by adding either the source 
or destination module coordinate. Each module includes an AER 
router which decides how to route the events (Zamarreno-Ramos 
et al, 2013). This way, any network architecture can be imple- 
mented, like the one shown in Figure 4 with any number of Gabor 
filters. Each convolutional module is programmed to extract a 
specific orientation by writing the appropriate kernel. In our 
experiments, the resolution of the convolutional blocks is 128 x 
128 pixels. 

In order to compensate the mismatch between the two DVS 
chips, an initial procedure must be implemented. This procedure 
consists of setting the values of the bias signals which control the 
sensitivity of the photosensors to obtain approximately the same 
number of events in response to a fixed stimulus in both retinas. 

CALIBRATION RESULTS 

In order to calibrate the setup with both DVS retinas (with a base- 
line distance of 14 cm, being the retinas approximately aligned 
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FIGURE 5 I Experimental stereo setup. 



FIGURE 6 I Block diagram for the convolutional block implemented on 
FPGA. 
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and the focal length of the lenses 8 mm), we built a structure of 
36 blinking LEDs distributed in two orthogonal planes, each with 
an array of 6 x 3 LEDs with known 3D coordinates in each plane 
(see Figure 2). The horizontal distance between LEDs is 5 cm. 
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FIGURE 7 I Block diagram for a sample network with 3x3 
convolutional blocks implemented on FPGA. 



while the vertical separation is 3.5 cm. This structure was placed 
in front of the DVS stereo setup at approximately 1 m distance, 
and the events generated by the retinas were recorded by the com- 
puter. The LEDs would blink sequentially, so that when one LED 
produces events no other LED is blinking. This way, during a 
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FIGURE 9 I Measurement of the disparity (distance) between a pixel in 
Retina 1 and its corresponding epipolar line in Retina 2. The minimum 
disparity point separates Region A and B. 
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FIGURE 8 I 3D reconstruction of the coordinates of the calibration 
LEDs. (A) With pixel resolution and (B) with sub-pixel resolution. Blue 
circles represent the real location of the LEDs, while red crosses indicate 
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the reconstructed coordinate. (C,D) Show the measured errors absolute 
value in cm for approaches 1 and 2, respectively. Red lines represent the 
mean error. 
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FIGURE 10 I Characterization of the 3D reconstruction of the epipolar 
lines for different pixels in Retina 1. Each color represents a different 
pixel. (A) Distance between the reconstructed points and the retinas for 
different disparity values. The dashed lines represent the upper and lower 
linnits associated to the allowed deviation around the epipolar line. 

(B) Reconstruction error for 3D points closer to the retinas. Region A. 

(C) Reconstruction error for points farther fronn the retinas, Region B. 



simultaneous event burst in both cameras, there is only one LED 
in 3D space blinking, resulting in a unique spatial correspondence 
between the events produced in both retinas and the original 3D 
position. This recording was processed offline to obtain the 2D 
coordinates of the LEDs projected in both retinas following two 
different approaches: 

(1) We represent a 2D image coding the number of spikes gener- 
ated by each pixel. This way for each LED we obtain a cluster 
of pixels with large values. The coordinate of the pixel with 
the largest value in each cluster is considered to be the 2D 
projection of the LED. The accuracy of this measurement is 
one pixel. 

(2) Using the same 2D image, the following method is applied. 
First, all those pixels with a number of spikes below a cer- 
tain threshold are set to zero, while all those pixels above 
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FIGURE 11 I Kernels used for the 4-orientation configuration. Each row 
represents a different scale (fronn snnaller to larger kernels). The nnaxinnunn 
kernel value is 15 and the nnininnunn is -7. Kernel size is 11 x 11 pixels. 
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FIGURE 12 I Photograph of the three objects used to test the 3D 
reconstruction algorithm: a pen, a ring, and a cube. 



the threshold are set to one, obtaining a binarization of the 
image. Figure SI in Calculation of Projection Matrix P in 
Supplementary Material shows an example of a 2D binarized 
image obtained for one DVS, where the 36 clusters represent 
the responses to the blinking LEDs. Then, for each cluster 
of pixels we calculate the mean coordinate, obtaining the 2D 
projection of the LEDs with sub-pixel resolution. 

In both cases, these 2D coordinates together with the known 3D 
positions of the LEDs in space are used to calculate the projec- 
tion matrices Pi and P2, and the fundamental matrix P following 
the methods described in section Stereo Calibration. To vali- 
date the calibration. Pi and P2 were used to reconstruct the 3D 
calibration pattern following the method described in section 3D 
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Table 1 | Comparison of the 3D reconstmction results for the pen. 
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The first column (0 orientations) presents the results obtained applying the matching algorithm to the retinas events (algorithm A, section Event Matching), while 
the rest of the columns are related to the pair-wise application of the matching algorithm to the outputs of the Gabor filters (algorithm B, section Event Matching), 
from Scale 1 (smaller kernels) to Scale 4 (larger kernels). For each scale, different numbers of orientations are considered (from 2 to 8), as indicated in the first 
row (Orientations). Second row (Nqv) shows the number of events processed (in Kevents) by the matching algorithm in each case (i.e., the total number of events 
generated by all the filters). Third row (Nm) presents the number of matched events (in Kevents) produced by the algorithm, while fourth row (Matching Rate) shows 
the ratio of matched events over the total number of events generated by the Gabor filters (Matching Rate = 100 ■ Nm/Nev , in %). Fifth row (Isolated events) shows 
the ratio of isolated events over the total number of matched events (in %). Sixth row (Men) presents the ratio of wrongly matched events over the total number 
of matched events (in %). The last row (Nm-correct) encapsulates the number of matched events with the ratio of isolated and wrongly matched events, presenting 
the number of correctly matched events (Nm-correct = - (^l^^l^^^^gy^ . - (^i^ . in Kevents). 



Reconstruction, obtaining the results shown in Figures 8A,B. The 
reconstruction error is measured as the distance between each 
original 3D point and its corresponding reconstructed position, 
giving the results shown in Figures 8C,D. As can be seen in the 
figure, the mean reconstruction error for approach 1 is 7.3 mm 
with a standard deviation of 4.1 mm, while for approach 2 it 
is only 2 mm with a standard deviation of 1 mm. This error is 
comparable to the size of each LED (1 mm). 

PRECISION CHARACTERIZATION 

Using the calibration results obtained in the previous subsection, 
we performed the following evaluation of the 3D reconstruction 
method. For a fixed pixel m\ in Retina 1, we used the fiindamen- 
tal matrix F to calculate the corresponding epipolar line in Retina 
2 £p2> represented in Figure 9. Although a perfect alignment 
between the two retinas would produce an epipolar line parallel 
to the X-axis and crossing the pixel position [minimum disparity 
point coincident with (xi , /i) ] , we represent a more general case, 
where the alignment is performed manually and is not perfect. 
This case is illustrated in Figure SI (see Calculation of Projection 
Matrix P in Supplementary Material), where we show the 2D 
images representing the activity recorded by both retinas during 
calibration. The orientations of the epipolar lines indicate that 
the alignment is not perfect. The mean disparity for the LEDs 
coordinates is 24.55 pixels. Considering that we admit a devia- 
tion around the epipolar line of S^p. = 1 pixel in the matching 



algorithm, we calculated two more lines, an upper and a lower 
limit, given by the distance of ±:l pixel to the epipolar line. Using 
projection matrices Pi and P2, we reconstructed the 3D coor- 
dinates for all the points in these three lines. We repeated the 
procedure for a total of four different pixels in Retina 1 m\ (i = 
1, 2, 3, 4) distributed around the visual space, obtaining four 
sets of 3 -dimensional lines. In Figure lOA, we represent the dis- 
tance between these 3D points and the retinas for each disparity 
value [the disparity measures the 2D euclidean distance between 
the projections of a 3D point in both retinas (xi , yi ) and (:>C2 , 72) ] > 
where each color corresponds to a different pixel m\ in Retina 1, 
and the dashed lines represent the upper and lower limits given 
by the tolerance of 1 pixel around the epipolar lines. As can be 
seen in the figure, each disparity has two different values of dis- 
tance associated, which represent the two possible points in Ep2 
which are at the same distance from m\ . This effect results in two 
different zones in each trace (regions A and B in Figure 9), which 
correspond to two different regions in the 3D space, where the 
performance of the reconstruction changes drastically. Therefore, 
we consider both areas separately in order to estimate the recon- 
struction error. Using the range of distances given by Figure lOA 
between each pair of dashed lines, we calculate the reconstruction 
error for each disparity value as (^/max — d^m)/[^di where dmax 
and d^in represent the limits of the range of distance at that point, 
and [Id is the mean value. Figure lOB shows the obtained error for 
the 3D points located in the closer region (A), while Figure IOC 
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FIGURE 13 I Illustration of enhancing edges and noise reduction by a 
Gabor filter. (A) Input events representing a discontinuous edge with 
noise. (B) Output events generated by the Gabor filter, with the 
reconstructed edge without noise. (C) Gabor kernel. All axes represent 
pixels, being the visual space in (A,B) 128 x 128 and the size of the kernel 
in (C) 11 X 11. 



corresponds to the points farther from the retinas (Region B). In 
both figures, each Hne represents a different pixel m\ in Retina 1 . 
As shown in Figure lOB, the reconstruction error in the area of 
interest (around Im distance from the retinas) is less than 1.5%. 



Note that the minimum disparity value is around 20 pixels (while 
a perfect alignment would give 0), showing the robustness of the 
method for manual approximate alignment. 

3D RECONSTRUCTION 

For the experimental evaluation of the 3D reconstruction, we ana- 
lyzed the effect of several configurations of Gabor filters on the 
event matching algorithm B in order to compare them to algo- 
rithm A. For each configuration, we tested different numbers of 
orientation Gabor filters (from 2 to 8). All filters had always the 
same spatial scale, and we tested 4 different scales. Identical fil- 
ters were applied to both retina outputs. Each row in Figure 1 1 
shows an example of the kernels used in a configuration of 4 ori- 
entations (90, 45, 0, —45°), each configuration for a given spatial 
scale. In general, the different angles implemented in each case 
are uniformly distributed between 90 and —90°. This strategy was 
used to reconstruct in 3D the three objects shown in Figure 12: a 
14 cm pen, a 22 cm diameter ring, and a 15 cm side metal wire 
cube structure. 

Pen 

A swinging pen of 14 cm length was moved in front of the two 
retinas for half a minute, with a number of approximately 100 
Kevents generated by each retina. Table 1 summarizes the results 
of the 3D reconstruction, in terms of events. The column labeled 
"Orientations 0" corresponds to applying the matching algorithm 
directly to the retina pair outputs (algorithm A). When using 
Gabor filters (algorithm B), experiments with four different scales 
were conducted. For each scale, a different number of simultane- 
ous filter orientations were tested, ranging from 2 to 8. In order 
to compare the performance of the stereo matching algorithm 
applied directly to the retinas (algorithm A, see section Event 
Matching) and applied to the outputs of the Gabor filters (algo- 
rithm B, see section Event Matching), the second row in Table 1 
(Nev) shows the number of events processed by the algorithm in 
both cases. We show only the number of events coming origi- 
nally from Retina 1, as they both have been configured to generate 
approximately the same number of events for a given stimulus. 

When the algorithm is applied directly to the output of the reti- 
nas, the number of matched pairs of events obtained is around 
28 Kevents (28% of success rate). The third row in Table 1 
(Nm) shows the number of matched events for the different 
configurations of Gabors. If we calculate the percentage of suc- 
cess obtained by the algorithm for each configuration of filters 
in order to compare it with the 28% provided by the retinas 
alone, we obtain the values shown in the fourth row of Table 1 
(Matching Rate) . 

Although these results show that the matching rate of the algo- 
rithm is smaller when we use Gabor filters to extract information 
about the orientation of the edges that generated the events, we 
should consider that the performance of 3D reconstruction is 
determined by the total number of matched events, not the rel- 
ative proportion. Note that the Gabor filters are capable of edge 
filling when detecting somewhat sparse or incomplete edges from 
the retina, thus enhancing edges and providing more events for 
these edges. Figure 13 shows an example where a weak edge (in 
Figure 13 A) produced by a retina together with noise events is 
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filled by a Gabor filter (with the kernel shown in Figure 13C) 
producing the enhanced noise-less edge in Figure 13B, and 
increasing the number of edge events from 24 to 70 while remov- 
ing all retina-noise events. The more matched events, the better 
3D reconstruction. For that reason, we consider that a bank of 
8 Gabor filters with kernels of scale 4 gives the best result, with 
more than 39 Kevents that can be used to reconstruct the 3D 
sequence, using 100 Kevents generated by the retinas. This appli- 
cation of Gabor filters for edges filling was first demonstrated in 



(Lindenbaum et al., 1994), and has also been used for fingerprint 
image enhancement (Hong et al, 1998; Greenberg et al, 2002). 

Another parameter that can be used to measure the quality of 
the 3D reconstruction is the proportion of "isolated" events in the 
matched sequence. We define an isolated event as an event which 
is not correlated to any other event in a certain spatio-temporal 
window, meaning that no other event has been generated in its 
neighbor region within a limited time range. A non- isolated event 
(an event generated by an edge of the object) will be correlated 





Number of orientations Number of orientations 

FIGURE 15 I Graphical representation of Table 1. Each subplot corresponds to a different row of the table, showing the obtained values for each nunnber of 
orientations and scale. The black horizontal lines indicate the values obtained using algorithnn A (0 orientations). 
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FIGURE 17 I Result of the 3D reconstruction of the swinging pen recording. Eacli plot (fronn A-l) corresponds to a 50 nns-franne representation of the 3D 
coordinates of the nnatched events. 



to some other events generated by the same edge, which will be 
close in space and time. Note that these isolated matched events 
correspond to false matches. These false matches can be produced 
when an event in one retina is matched by mistake with a noise 
event in the other retina, or when two or more events that hap- 
pen very simultaneously in 3D space are cross-matched by the 
matching algorithm. With this definition of isolated events, the 
28 Kevents that were matched for the retinas without any filter- 
ing were used to reconstruct the 3D coordinates of these events, 
resulting in only 2.93% of isolated events. After the application 
of the same methodology to all the Gabor filters configurations, 
the results in the fifth row in Table 1 {Isolated events) are obtained. 
These results show that several configurations of Gabor filters give 
a smaller proportion of isolated events. 

In order to remove the retina-noise events, it is also pos- 
sible to insert a noise removal block directly at the output of 
the retina (jAER, 2007). However, this introduces a small extra 
latency before the events can be processed, thus limiting event- 
driven stereo vision for very high speed applications (although 
it can be a good solution when timing restrictions are not too 
critical). The effect of Gabor filters on noise events is also illus- 
trated in Figure 13, where all the events that were not part of an 
edge with the appropriate orientation are removed by the filter. 



However, it is possible that some noise events add their contribu- 
tions together producing noise events at the output of the Gabor 
filters. Two different things can happen with these events: (1) the 
stereo matching algorithm does not find a corresponding event 
in the other retina; (2) there is a single event which satisfies all 
restrictions, so a 3D point will be reconstructed from a noise 
event, producing a wrongly matched event, as is described in the 
next paragraph. 

Although the object used in this first example is very sim- 
ple, we must consider the possibility that the algorithm matches 
wrongly some events. In particular, if we think about a wide object 
we can have events generated simultaneously by two far edges: the 
left and the right one. Therefore, it can happen that an event cor- 
responding to the left edge in Retina I does not have a proper 
partner in Retina 2, but another event generated by the right edge 
in Retina 2 might satisf)^ all the restrictions imposed by the match- 
ing algorithm. Figure 14 illustrates the mechanism that produces 
this error. Let us assume that the 3D object has its left and right 
edges located at positions A and B in 3D space. Locations A 
and B produce events at and in Retina 1, and at and 

in Retina 2. These events are the projections onto the focal 

points Ri and R2 of both retinas, activating pixels ^x^, y'^y with 
i = 1,2 and j = A, B. Therefore, an event generated in Retina 
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Table 2 | Comparison of the 3D reconstruction results for the ring. 
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The meaning of the columns and rows is as in Table 1. 
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FIGURE 18 I Graphical representation of Table 2. Each subplot corresponds to a different row of the table, showing the obtained values for each nunnber of 
orientations and scale. The black horizontal lines indicate the values obtained using algorithnn A (0 orientations). 



1 with coordinates (x^, yf) should match another event gener- 
ated in Retina 2 with coordinates (x^, y^). However, note that 
in Figure 13, an edge at position D is captured by Retina 1 at 
the same pixel that an edge at A, and in Retina 2 they would 
be on the same epipolar lines. The same happens for edges at 
positions B and C. Consequently, it can happen that no event is 
produced in Retina 2 at coordinate (x^, 7^) at the same time. 



but another event with coordinates (xf , ) is generated within 
a short time range by the opposite simultaneously moving edge, 
being those coordinates in the same epipolar line. In that case, 
the algorithm might match (x^, y^) with (x^, yf), reconstruct- 
ing a wrong 3D point in coordinate D. The opposite combination 
would produce a wrong 3D event in point C. This effect could 
produce false edges in the 3D reconstruction, especially when 
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FIGURE 19 I Results obtained for the rotating ring. (A) Disparity nnap 
reconstructed with Tframe = 50 nns corresponding to the rotation of the ring. 
(B) Result of the 3D reconstruction of the same frame of the ring recording. 



processing more complex objects. However, the introduction of 
the Gabor filters to extract the orientation of the edges will reduce 
the possibility of matching wrong pairs of events. In order to mea- 
sure the proportion of wrongly matched events, we consider that 
all the good pairs of events will follow certain patterns of dis- 
parity, so all the events which are close in time will be included 
within a certain range of disparity values. Calculating contin- 
uously the mean and standard deviation of the distribution of 
disparities, we define the range of acceptable values, and we iden- 
tify as wrongly matched all those events whose disparity is outside 
that range. Using this method, we calculate the proportion of 
wrongly matched events and present it (in %) in the sixth row 
of Table 1 (Merr)- Finally, the last row presents the number of cor- 
rectly matched events, subtracting both the isolated and wrongly 
matched events from the total number of matched events: 

N„,_correct = - ( '^°'^'foO " ^m) ' (iW ' ^m)- AH these 

results are presented graphically in Figure 15, where the colored 
vertical bars represent the results obtained applying algorithm B 
with different number of orientations and scales, while the black 
horizontal lines indicate the values obtained using algorithm A 
(no Gabor filters). From this figure, we decide that the best case 



is 8 orientations and Scale 4, as it provides the largest number of 
correctly matched events. However, it could also be argued that 
8 orientations and Scale 3 gives a smaller number of wrongly 
matched events, but in that case the number of correctly matched 
events is also smaller. 

Using the sequence of matched events provided by the algo- 
rithm in the best case (8 orientations. Scale 4), we computed 
the disparity map. The underlying reasons why this configura- 
tion provides the best result are: (a) Scale 4 matches better the 
scale of the object edges in this particular case, and (b) given 
the object geometry and its tilting in time, a relatively fine ori- 
entation angle detection was required. If we compare this case 
with the results obtained applying algorithm A without Gabor fil- 
ters (first column in Table 1), we observe an increase of 39% in 
the number of matched events, while the proportions of isolated 
events and wrongly matched pairs have decreased by 65 and 2.5%, 
respectively. Moreover, the number of correctly matched events 
has increased by 44%. In order to compute the disparity map, 
we calculated the euclidean distance between both pixels in each 
pair of events (from Retina 1 and Retina 2). This measurement 
is inversely proportional to the distance between the represented 
object and the retinas, as further objects produce a small dispar- 
ity and closer objects produce a large disparity value. Figure 16 
shows 9 consecutive frames of the obtained disparity sequence, 
with a frame time of 50 ms. The disparity scale goes from dark 
blue to red to encode events from far to close. 

Applying the method described in section 3D Reconstruction, 
the 3 dimensional coordinates of the matched events are calcu- 
lated. Figure 17 shows 9 consecutive frames of the resultant 3D 
reconstruction, with a frame time of 50 ms. The shape of the 
pen is clearly represented as it moves around 3D space. Using 
this sequence, we measured manually the approximate length of 
the pen by calculating the distance between the 3D coordinates 
of pairs of events located in the upper and lower limits of the 
pen, respectively. This gave an average length of 14.85 cm, being 
the real length 14 cm, which means an error of 0.85 cm. For an 
approximate distance to the retinas of 1 m, the maximum error 
predicted in Figure 10 would be below 1.5%, resulting in 1.5 cm. 
Therefore, we can see that the 0.85 cm error is smaller than the 
maximum predicted by Figure 10. 

Ring 

A ring with a diameter of 22 cm was rotating slowly in front 
of the two retinas for half a minute, with a number of approxi- 
mately 115 Kevents generated by each retina. As in the previous 
example, the matching algorithm was appHed both to the events 
generated by the retinas (see section Event Matching, algorithm 
A) and to the events generated by the Gabor filters (see section 
Event Matching, algorithm B), in order to compare both methods. 
Table 2 shows all the results for all the configurations of Gabor 
filters (from 2 to 8 orientations, with scales 1-4). All these results 
are presented graphically in Figure 18, where the colored verti- 
cal bars represent the results obtained applying algorithm B with 
different number of orientations and scales, while the black hor- 
izontal lines indicate the values obtained using algorithm A (no 
Gabor filters). We can see in the table how the largest number 
of matched events (25 K) is obtained for 8 orientations and both 
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Table 3 | Comparison of the 3D reconstruction results for the cube. 
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FIGURE 20 I Graphical representation of Table 3. Each subplot corresponds to a different row of the table, showing the obtained values for each nunnber of 
orientations and scale. The black horizontal lines indicate the values obtained using algorithnn A (0 orientations). 



scales 2 and 3. Although the ratio of noise events is very similar for 
both of them (1.9% for Scale 2 and 2.0% for Scale 3), Scale 3 pro- 
vides a smaller ratio of wrongly matched events (7.8% for Scale 2 
and 6.4% for Scale 3). Therefore, we conclude that the best per- 
formance is found with 8 orientations and Scale 3, as it is more 
appropriate to the geometry of the object. If we compare this case 
with the results obtained applying algorithm A without Gabor fil- 
ters (first column in Table 2), we observe an increase of 47% in 
the number of matched events, while the proportions of isolated 



events and wrongly matched pairs have decreased by 66 and 46%, 
respectively. Therefore, the number of correctly matched events 
has increased by 64%. A frame reconstruction of the disparity 
map and the 3D sequence are shown in Figure 19. 

The diameter of the reconstructed ring was measured manu- 
ally by selecting pairs of events with the largest possible separa- 
tion. This gave an average diameter of 21.40 cm, which implies a 
reconstruction error of 0.6 cm. This error is also smaller than the 
maximum predicted in Figure 10. 
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FIGURE 21 I Results obtained for the cube. (A) Disparity nnap 
reconstructed with Tframe = 50 nns corresponding to the rotation of the 
cube. (B) Result of the 3D reconstruction of the sanne franne of the cube 
recording. 



Cube 

Finally, a cube with an edge length of 15 cm was rotating in 
front of the retinas, with a number of approximately 118 Kevents 
generated by each retina in approximately 20 s. The same proce- 
dure performed in previous examples was repeated, obtaining the 
results shown in Table 3. All these results are presented graphi- 
cally in Figure 20, where the colored vertical bars represent the 
results obtained applying algorithm B with different number of 
orientations and scales, while the black horizontal lines indicate 
the values obtained using algorithm A (no Gabor filters). In this 
case, the largest number of matched events (31 K) is given by 8 
orientations and Scale 3, while both the ratio of isolated events 
and the ratio of wrongly matched events are very similar for 
the four different scales with 8 orientations (around 3% noise 
and 10.9% wrong matches). Therefore, the best performance is 
given by 8 orientations and Scale 3. If we compare this case with 
the results obtained applying algorithm A without Gabor fil- 
ters (first column in Table 3), we observe an increase of 181% 
in the number of matched events, while the proportions of iso- 
lated events and wrongly matched pairs have decreased by 78 and 
46%, respectively. The number of correctly matched events has 
increased by 350%. 

A reconstruction of the disparity map and the 3D sequence 
is shown in Figure 21. The ratio of wrongly matched events is 
much larger than on the ring example (about twice as much). 
That is because this object has many parallel edges, increasing 
the number of events in the same epipolar line which are can- 
didates to be matched and which the orientation filters do not 
discriminate. While Figure 14 shows a situation where 2 different 
positions in 3D space (A and B) can generate events that could 
be wrongly matched, in this case we could find at least 4 different 
positions in 3D space (as we have 4 parallel edges) with the same 
properties. 

The edge length of the reconstructed 3D cube was measured 
manually on the reconstructed events, giving an average length of 
16.48 cm, which implies a reconstruction error of 1.48 cm. This 
error is smaller than the maximum predicted in Figure 10. 

CONCLUSION 

This paper analyzes different strategies to improve 3D stereo 
reconstruction in event-based vision systems. First of all, a com- 
parison between stereo calibration methods showed that by using 
a calibration object with LEDs placed in known locations and 
measuring their corresponding 2D projections with sub-pixel res- 
olution, we can extract the geometric parameters of the stereo 
setup. This method was tested by reconstructing the known coor- 
dinates of the calibration object, giving a mean error comparable 
to the size of each LED. 

Event matching algorithms have been proposed for stereo 
reconstruction, taking advantage of the precise timing informa- 
tion provided by DVS sensors. In this work, we have explored 
the benefits of using Gabor filters to extract the orientation 
of the object edges and match events from pair wise fil- 
ters directly. This imposes the restriction that the distance 
from the stereo cameras to the objects must be much larger 
than the focal length of the lenses, so that edge orientations 
appear similar in both cameras. By analyzing different numbers 



of filters with several spatial scales, we have shown that we 
can increase the number of reconstructed events for a given 
sequence, reducing the number of both noise events and wrong 
matches at the same time. This improvement has been vali- 
dated by reconstructing in 3D three different objects. The size 
of these objects was estimated from the 3D reconstruction, with 
an error smaller than theoretically predicted by the method 
(1.5%). 
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